Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feerate histogram to getmempoolinfo #21422

Closed

Conversation

kiminuo
Copy link
Contributor

@kiminuo kiminuo commented Mar 12, 2021

This PR is a slightly modified version of #15836 (jonasschnelli):

This follows the approach of adding statistical information to Bitcoin Core that would otherwise be inefficient to calculate outside of the codebase.

Adds an optional feerate histogram to getmempoolinfo.

The concept and code is heavily inspired by the stats jhoenicke runs (https://github.com/jhoenicke/mempool, http://bitcoin-mempool.info).

If someone has a good idea how to make the feerate-groups dynamic but also semi-constant for similar fee environments, please comment.

If this is feature we'd like to have in master (concept ACKs), I'd continue this with writing tests.

A simple plot of the data is here.
RPC output sample is here.

My attempts to contact jonasschnelli were, unfortunately, unsuccessful so I decided to create this PR in an attempt to move this forward. If this is somehow problematic, please let me know to work it out. edit: Jonas is happy the work continues.

This PR

Note that REST support which is in #15836 is not included in this PR. It can be improved in this PR or in a follow-up one if it is deemed useful/required/etc.

Applied review comments from the old PR

Test commands

$ ./bitcoin-cli -testnet getmempoolinfo "[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 25, 30, 40, 50, 60, 70, 80, 100, 120, 140, 170, 200]" # To test the new behavior
$ test/functional/test_runner.py mempool_fee_histogram.py # To run the new test
$ ./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test help getmempoolinfo
getmempoolinfo ( [fee_rate,...] )

Returns details on the active state of the TX memory pool.

Arguments:
1. fee_histogram    (json array, optional) Fee statistics grouped by fee rate ranges
     [
       fee_rate,    (numeric, required) Fee rate (in sat/vB) to group the fees by
       ...
     ]

Result:
{                               (json object)
  "loaded" : true|false,        (boolean) True if the mempool is fully loaded
  "size" : n,                   (numeric) Current tx count
  "bytes" : n,                  (numeric) Sum of all virtual transaction sizes as defined in BIP 141. Differs from actual serialized size because witness data is discounted
  "usage" : n,                  (numeric) Total memory usage for the mempool
  "total_fee" : n,              (numeric) Total fees for the mempool in BTC, ignoring modified fees through prioritizetransaction
  "maxmempool" : n,             (numeric) Maximum memory usage for the mempool
  "mempoolminfee" : n,          (numeric) Minimum fee rate in BTC/kvB for tx to be accepted. Is the maximum of minrelaytxfee and minimum mempool fee
  "minrelaytxfee" : n,          (numeric) Current minimum relay fee for transactions
  "unbroadcastcount" : n,       (numeric) Current number of transactions that haven't passed initial broadcast yet
  "fee_histogram" : {           (json object)
    "fee_rate_groups" : {       (json object)
      "<fee_rate_group>" : {    (json object) Fee rate group named by its lower bound (in sat/vB), identical to the "from" field below
        "size" : n,             (numeric) Cumulative size of all transactions in the fee rate group (in vBytes)
        "count" : n,            (numeric) Number of transactions in the fee rate group
        "fees" : n,             (numeric) Cumulative fees of all transactions in the fee rate group (in sat)
        "from" : n             (numeric) Group contains transactions with fee rates equal or greater than this value (in sat/vB)
      },
      ...
    },
    "total_fees" : n            (numeric) Total available fees in mempool (in sat)
  }
}

Examples:
> bitcoin-cli getmempoolinfo
> bitcoin-cli getmempoolinfo "[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 25, 30, 40, 50, 60, 70, 80, 100, 120, 140, 170, 200]"
> curl --user myusername --data-binary '{"jsonrpc": "1.0", "id": "curltest", "method": "getmempoolinfo", "params": []}' -H 'content-type: text/plain;' http://127.0.0.1:8332/
> curl --user myusername --data-binary '{"jsonrpc": "1.0", "id": "curltest", "method": "getmempoolinfo", "params": [[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 25, 30, 40, 50, 60, 70, 80, 100, 120, 140, 170, 200]]}' -H 'content-type: text/plain;' http://127.0.0.1:8332/

Output on testnet (2022-07-09)

./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 25, 30, 40, 50, 60, 70, 80, 100, 120, 140, 170, 200]"
JSON output
{
"loaded": true,
"size": 10,
"bytes": 2652,
"usage": 13504,
"total_fee": 0.00363010,
"maxmempool": 300000000,
"mempoolminfee": 0.00001000,
"minrelaytxfee": 0.00001000,
"unbroadcastcount": 0,
"fee_histogram": {
  "fee_rate_groups": {
    "0": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 0
    },
    "1": {
      "size": 403,
      "count": 2,
      "fees": 403,
      "from": 1
    },
    "2": {
      "size": 554,
      "count": 2,
      "fees": 561,
      "from": 2
    },
    "3": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 3
    },
    "4": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 4
    },
    "5": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 5
    },
    "6": {
      "size": 255,
      "count": 1,
      "fees": 1345,
      "from": 6
    },
    "7": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 7
    },
    "8": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 8
    },
    "10": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 10
    },
    "12": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 12
    },
    "14": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 14
    },
    "17": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 17
    },
    "20": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 20
    },
    "25": {
      "size": 352,
      "count": 1,
      "fees": 9505,
      "from": 25
    },
    "30": {
      "size": 144,
      "count": 1,
      "fees": 4520,
      "from": 30
    },
    "40": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 40
    },
    "50": {
      "size": 374,
      "count": 1,
      "fees": 20000,
      "from": 50
    },
    "60": {
      "size": 351,
      "count": 1,
      "fees": 23937,
      "from": 60
    },
    "70": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 70
    },
    "80": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 80
    },
    "100": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 100
    },
    "120": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 120
    },
    "140": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 140
    },
    "170": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 170
    },
    "200": {
      "size": 219,
      "count": 1,
      "fees": 302739,
      "from": 200
    }
  },
  "total_fees": 363010
}
}

Various inputs

./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[0]"       # OK, one fee rate group covering all possible transaction fee rates (i.e. `>=0`); total_fee = fee_histogram.total_fees
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[-1]"      # reports: Non-negative values are expected
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[0,0]"     # reports: Strictly increasing values are expected
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[0,1,0]"   # reports: Strictly increasing values are expected
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[0, 1000000000000000000000000000000000]" # reports: JSON integer out of range
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo '[1.2,2,3]' # reports: JSON integer out of range
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo '["2"]'     # reports: Error parsing JSON: ['2']

@promag
Copy link
Member

promag commented Mar 12, 2021

Hey @kiminuo, can you amend the commit and remove my mention? Keep in mind that mentions in commits lead to notifications. Ty.

@kiminuo
Copy link
Contributor Author

kiminuo commented Mar 12, 2021

Hey @kiminuo, can you amend the commit and remove my mention? Keep in mind that mentions in commits lead to notifications. Ty.

Yes, sorry for that.

@jonasschnelli
Copy link
Contributor

Thanks for picking this up.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 16, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept NACK 0xB10C
Concept ACK jonatack, JeremyRubin, ghost, jamesob, molnard, sipa
Stale ACK stratospher, kristapsk
Ignored review glozow

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #26525 (Remove -mempoolfullrbf option by BitcoinErrorLog)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@kiminuo kiminuo force-pushed the feature/2021-03-Feerate-histogram branch 3 times, most recently from 476f660 to 81cd4bb Compare March 21, 2021 09:55
@jonatack
Copy link
Contributor

Concept ACK. For the test commit, maybe a more descriptive title that can be understood on its own.

Copy link
Contributor

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comments.

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
test/functional/mempool_fee_histogram.py Outdated Show resolved Hide resolved
test/functional/test_runner.py Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
@kiminuo
Copy link
Contributor Author

kiminuo commented Mar 21, 2021

@jonatack Thanks for the superb review! I'm working on incorporating your suggestions.

@kiminuo kiminuo force-pushed the feature/2021-03-Feerate-histogram branch 2 times, most recently from b2f8ddb to 3d923f4 Compare March 23, 2021 17:36
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
@kiminuo kiminuo force-pushed the feature/2021-03-Feerate-histogram branch 3 times, most recently from 2a05b39 to fb0824c Compare March 24, 2021 19:11
Copy link
Contributor

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for switching it from BTC/kvB to sat/vB!

I looked over this more thoroughly and saw quite a few remaining fixups and suggestions. One thing to ensure is that the help docs are the same as the actual output and use the correct fee rate units and arg types.

It would be good if the functional test actually verified that the values are correctly calculated after creating a few txns; ATM it is only really a smoke test that verifies the output structure.

suggestions

diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
index 8d7ce13e57..fbae7cdbaf 100644
--- a/src/rpc/blockchain.cpp
+++ b/src/rpc/blockchain.cpp
@@ -1512,7 +1512,7 @@ UniValue MempoolInfoToJSON(const CTxMemPool& pool, const std::optional<MempoolHi
          * ... cumulated fees */
         std::vector<uint64_t> sizes(limits.size(), 0);
         std::vector<uint64_t> count(limits.size(), 0);
-        std::vector<uint64_t> fees(limits.size(), 0);
+        std::vector<CAmount> fees(limits.size(), 0);
 
         for (const CTxMemPoolEntry& e : pool.mapTx) {
             const CAmount fee{e.GetFee()};
@@ -1528,15 +1528,19 @@ UniValue MempoolInfoToJSON(const CTxMemPool& pool, const std::optional<MempoolHi
             CHECK_NONFATAL(dsize > 0);
             CHECK_NONFATAL(asize + dsize - size > 0);

-            CAmount fpb{fee / size};     // fee per byte
-            CAmount afpb{afees / asize}; // fee per byte including ancestors
-            CAmount dfpb{dfees / dsize}; // fee per byte including descendants
-            CAmount tfpb{(afees + dfees - fee) / (asize + dsize - size)};
-            CAmount feeperbyte{std::max(std::min(dfpb, tfpb), std::min(fpb, afpb))};
+            const CAmount fpb{fee / size};     // fee per byte
+            const CAmount afpb{afees / asize}; // fee per byte including ancestors
+            const CAmount dfpb{dfees / dsize}; // fee per byte including descendants
+            const CAmount tfpb{(afees + dfees - fee) / (asize + dsize - size)};
+            const CAmount fee_per_byte{std::max(std::min(dfpb, tfpb), std::min(fpb, afpb))};
+ 
+            assert(fpb == CFeeRate(fee, size).GetFee(1));
+            assert(afpb == CFeeRate(afees, asize).GetFee(1));
+            assert(dfpb == CFeeRate(dfees, dsize).GetFee(1));
+            assert(tfpb == CFeeRate(afees + dfees - fee, asize + dsize - size).GetFee(1));
+
-            // Distribute feerates into feelimits
+            // Distribute fee rates into fee limits
             for (int i = limits.size() - 1; i >= 0; --i) {
-                if (feeperbyte >= limits[i]) {
+                if (fee_per_byte >= limits[i]) {
                     sizes[i] += size;
                     ++count[i];
                     fees[i] += fee;
@@ -1545,23 +1549,23 @@ UniValue MempoolInfoToJSON(const CTxMemPool& pool, const std::optional<MempoolHi
             }
         }
 
         UniValue ranges(UniValue::VOBJ);
         for (size_t i = 0; i < limits.size(); ++i) {
             UniValue info_sub(UniValue::VOBJ);
-            info_sub.pushKV("sizes", sizes[i]);
-            info_sub.pushKV("count", count[i]);
-            info_sub.pushKV("fees", fees[i]);
-            info_sub.pushKV("from_feerate", limits[i]);
+            info_sub.pushKV("size", sizes.at(i));
+            info_sub.pushKV("count", count.at(i));
+            info_sub.pushKV("fees", fees.at(i));
+            info_sub.pushKV("from", limits.at(i));
 
             if (i == limits.size() - 1) {
-                info_sub.pushKV("to_feerate", "Max"); // TODO.
+                info_sub.pushKV("to", "max"); // TODO.
             } else {
-                info_sub.pushKV("to_feerate", limits[i + 1]);
+                info_sub.pushKV("to", limits[i + 1]);
             }
 
-            total_fees += fees[i];
-            ranges.pushKV(ToString(limits[i]), info_sub);
+            total_fees += fees.at(i);
+            ranges.pushKV(ToString(limits.at(i)), info_sub);
         }
 
         UniValue info(UniValue::VOBJ);
@@ -1580,7 +1584,7 @@ static RPCHelpMan getmempoolinfo()
                 {
                     {"fee_histogram", RPCArg::Type::ARR, RPCArg::Optional::OMITTED_NAMED_ARG, "Fee statistics grouped by fee rate ranges",
                         {
-                            {"fee_rate", RPCArg::Type::NUM, RPCArg::Optional::NO, "Fee rate (in satoshis) to group the fees by"},
+                            {"fee_rate", RPCArg::Type::NUM, RPCArg::Optional::NO, "Fee rate (in " + CURRENCY_ATOM + "/vB) to group the fees by"},
                         },
                     },
                 },
@@ -1591,31 +1595,31 @@ static RPCHelpMan getmempoolinfo()
                         {RPCResult::Type::NUM, "size", "Current tx count"},
                         {RPCResult::Type::NUM, "bytes", "Sum of all virtual transaction sizes as defined in BIP 141. Differs from actual serialized size because witness data is discounted"},
                         {RPCResult::Type::NUM, "usage", "Total memory usage for the mempool"},
-                        {RPCResult::Type::STR_AMOUNT, "total_fee", "Total fees for the mempool in " + CURRENCY_UNIT + ", ignoring modified fees through prioritizetransaction"},
+                        {RPCResult::Type::STR_AMOUNT, "total_fee", "Total fees for the mempool in " + CURRENCY_UNIT + "/kvB, ignoring modified fees through prioritizetransaction"},
                         {RPCResult::Type::NUM, "maxmempool", "Maximum memory usage for the mempool"},
-                        {RPCResult::Type::STR_AMOUNT, "mempoolminfee", "Minimum fee rate in " + CURRENCY_UNIT + "/kB for tx to be accepted. Is the maximum of minrelaytxfee and minimum mempool fee"},
+                        {RPCResult::Type::STR_AMOUNT, "mempoolminfee", "Minimum fee rate in " + CURRENCY_UNIT + "/kvB for tx to be accepted. Is the maximum of minrelaytxfee and minimum mempool fee"},
                         {RPCResult::Type::STR_AMOUNT, "minrelaytxfee", "Current minimum relay fee for transactions"},
                         {RPCResult::Type::NUM, "unbroadcastcount", "Current number of transactions that haven't passed initial broadcast yet"},
                         {RPCResult::Type::OBJ, "fee_histogram", "",
                             {
-                                {RPCResult::Type::OBJ_DYN, "ranges", "Feerate groups",
+                                {RPCResult::Type::OBJ_DYN, "ranges", "",
                                 {
-                                    {RPCResult::Type::OBJ, "<feerate-group>", "Object per feerate group",
+                                    {RPCResult::Type::OBJ, "<fee_rate_group>", "Fee rate group named by its lower bound (in " + CURRENCY_ATOM + "/vB), identical to the \"from\" field below",
                                     {
-                                        {RPCResult::Type::NUM, "size", "Cumulated size of all transactions in feerate group"},
-                                        {RPCResult::Type::NUM, "count", "Amount of transactions in feerate group"},
-                                        {RPCResult::Type::NUM, "fees", "Cumulated fees of all transactions in feerate group (in satoshis)"},
-                                        {RPCResult::Type::NUM, "from_feerate", "Group contains transactions with feerates equal or greater than this value (in satoshis)"},
-                                        {RPCResult::Type::NUM, "to_feerate", "Group contains transactions with feerates less than this value (in satoshis)"},
+                                        {RPCResult::Type::NUM, "size", "Cumulative size of all transactions in the fee rate group"},
+                                        {RPCResult::Type::NUM, "count", "Number of transactions in the fee rate group"},
+                                        {RPCResult::Type::NUM, "fees", "Cumulative fees of all transactions in the fee rate group (in " + CURRENCY_ATOM + "/vB)"},
+                                        {RPCResult::Type::NUM, "from", "Group contains transactions with fee rates equal or greater than this value (in " + CURRENCY_ATOM + "/vB)"},
+                                        {RPCResult::Type::NUM, "to", "Group contains transactions with fee rates less than this value (in " + CURRENCY_ATOM + "/vB)"},
                                     }}}},
-                                {RPCResult::Type::NUM, "total_fees", "Total available fees in mempool (in satoshis)"},
+                                {RPCResult::Type::NUM, "total_fees", "Total available fees in mempool (in " + CURRENCY_ATOM + "/vB)"},
                             }},
                     }},
                 RPCExamples{
                     HelpExampleCli("getmempoolinfo", "") +
-                    HelpExampleCli("getmempoolinfo", R"([1, 10, 100, 200, 400, 800])") +
+                    HelpExampleCli("getmempoolinfo", R"('[1, 5, 10, 25, 50, 100]')") +
                     HelpExampleRpc("getmempoolinfo", "") +
-                    HelpExampleRpc("getmempoolinfo", R"([1, 10, 100, 200, 400, 800])")
+                    HelpExampleRpc("getmempoolinfo", R"([1, 5, 10, 25, 50, 100])")
                 },

diff --git a/test/functional/mempool_fee_histogram.py b/test/functional/mempool_fee_histogram.py
index 0cbaa97bfc..91c6dccc30 100755
--- a/test/functional/mempool_fee_histogram.py
+++ b/test/functional/mempool_fee_histogram.py
@@ -31,15 +31,15 @@ class MempoolFeeHistogramTest(BitcoinTestFramework):
         total_fees = 0
 
         for key, bin in info['fee_histogram']['ranges'].items():
-            assert_equal(int(key), bin['from_feerate'])
+            assert_equal(int(key), bin['from'])
             if bin['fees'] > 0:
                 assert_greater_than(bin['count'], 0)
             else:
                 assert_equal(bin['count'], 0)
             assert_greater_than_or_equal(bin['fees'], 0)
-            assert_greater_than_or_equal(bin['sizes'], 0)
-            if bin['to_feerate'] != 'Max':
-                assert_greater_than(bin['to_feerate'], bin['from_feerate'])
+            assert_greater_than_or_equal(bin['size'], 0)
+            if bin['to'] != 'max':
+                assert_greater_than(bin['to'], bin['from'])
             total_fees += bin['fees']

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
@kiminuo kiminuo force-pushed the feature/2021-03-Feerate-histogram branch from fb0824c to ccb043c Compare March 26, 2021 09:00
@kiminuo
Copy link
Contributor Author

kiminuo commented Mar 26, 2021

It would be good if the functional test actually verified that the values are correctly calculated after creating a few txns; ATM it is only really a smoke test that verifies the output structure.

This is on my TODO list. So I will improve it over coming days.

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
@maflcko
Copy link
Member

maflcko commented Mar 21, 2023

How can we expect the Bitcoin Core wallet to behave reasonably if it can't predict the fees well?

The Bitcoin Core wallet does not use this feature?

@jonasschnelli
Copy link
Contributor

jonasschnelli commented Mar 21, 2023

The original idea of this patch was to add a small and easy to maintain efficient way to generate a fee histogram to bitcoin core (see original PR description #15836 (comment)).

As described there, it was always possible to use getrawmempool (@jhoenicke script) to achieve this goal.

However, it is not efficient to dumb out all transactions of the complete mempool in a json format just to calculate the fee histogram with a python script.

Thus I think this is a valuable addition to Bitcoin Core.

The patch size is minimal and it seems like there is some interest in this feature.

EDIT: this PR has nothing to do with the Bitcoin cores internal wallet. It's pure mempool statistics.

@kristapsk
Copy link
Contributor

How can we expect the Bitcoin Core wallet to behave reasonably if it can't predict the fees well?

The Bitcoin Core wallet does not use this feature?

Does not use yet?

@prusnak
Copy link
Contributor

prusnak commented Mar 21, 2023

The Bitcoin Core wallet does not use this feature?

My point is that a wallet without fee estimation is not very usable and harms the overall network. If we don't want to have an effective way of determining the fee in Bitcoin Core, we might as well remove the entire wallet from the codebase.

@glozow
Copy link
Member

glozow commented Mar 21, 2023

There seems to be confusion here. This PR does not touch the fee estimator or the wallet in any way. The fee estimator already keeps track of how many transactions at specific feerates {in the mempool, have confirmed within N blocks, haven't confirmed within N blocks} there are internally; the wallet queries it directly for feerate estimates. I don't know of any examples where the wallet needs to see a graph of something.

If you feel that the fee estimation algorithms need improvement, by all means I'd say some review/PRs there would be welcome. Maybe take a look at #25380 or #21161.

For those that feel very passionately about improving how the wallet calculates fees, my personal recommendation would be to review #26152 to have the strongest impact.

@0xB10C
Copy link
Contributor

0xB10C commented Mar 21, 2023

It's confusing to me why this PR, tagged as [RPC/REST/ZMQ] and only touching src/rest.cpp and files in src/rpc/*, should have anything to do with the wallet or the GUI. AFAIK the wallet doesn't use the RPC interface to communicate with bitcoind. If the wallet or the GUI would want a histogram, it couldn't use the code added in this PR.

edit: glozow was 1min faster

@kristapsk
Copy link
Contributor

If the wallet or the GUI would want a histogram, it couldn't use the code added in this PR.

That would require just moving fee histogram code out of MempoolInfoToJSON() to a separate function somewhere out of rpc/, code itself seems to me reusable.

@sipa
Copy link
Member

sipa commented Mar 21, 2023

Bitcoin Core has had fee estimation logic for a long time (estimatefee RPC was added in 2014, replaced with estimatesmartfee in 2015).

The Bitcoin Core wallet automatically uses this internal feerate estimation logic to decide on the fees of transactions. You need to override things, or use more low-level RPCs, to not use it.

That feerate estimation logic works by looking at how quickly mempool transactions get confirmed, over longer time windows, not by looking at the current mempool composition (looking at just your own mempool may give very bad estimates if your relay policy isn't exactly what the network and miners use, which is very hard to guarantee).

This PR is about exposing mempool feerate histogram data through RPC. It's not a feerate estimator, and is entirely unrelated to the wallet. While the logic could in theory be used for feerate estimation too, that's not what this PR does, and I'm concerned that doing so would risk giving very inaccurate results when faced with diverging network policies.

Conceptually, however, I think it's reasonable to add something like this, as just RPC (or GUI) output. It can be informative for users, even if it can't be directly used for automatic fee estimation. Of course, that's subject to reviewer interest.

@ghost
Copy link

ghost commented Mar 21, 2023

I am not against sipa as I understand fee estimation is difficult however estimatesmartfee sucks and could be improved.

There are lot of occasions when I found it to be irrelevant and just look at mempool.space or multiple explorers

One of the project that I found more useful: https://github.com/TrueLevelSA/btc-congestion-manager

Anyway, this PR tries to improve how projects use mempool info and should not have been controversial at any point.

Request to all developers including maintainers to re consider their opinion.

@nopara73
Copy link

nopara73 commented Mar 22, 2023

For Wasabi's purposes this output is needed purely for fee estimation, so it'd be ideal if Core would estimate fees properly. Here's how I see things:

  • estimatesmartfee estimates fees based on the PAST
  • This PR enables us to put minimums on Core's fee estimations. Eg. if Core estimates 6 hours at 10 sat/b, based on the past, but the mempool is already filled with 12 hours of 11 sat/b estimations, then what we do (currently with Knots) is that based on the mempool histogram we upgrade Core's estimations accordingly. This'd mean we're finally considering the PRESENT
  • Finally, for future work we can also aim somehow at predicting the FUTURE (neuro net, AI stuff? Update: I think this repo by @djkazic wanna do something like this: https://github.com/djkazic/matrix)

@sipa
Copy link
Member

sipa commented Mar 22, 2023

@nopara73 I'm aware of estimatesmartfee's limitations, and understand the desire for something based on current mempool composition that can react faster to current conditions. The problem is that looking your own mempool only gives a reliable result in case you're running with policies that more or less match what miners are doing, and if not, becomes easily gameable. That is the reason why estimatesmartfee intentionally does not use mempool composition.

As I said, I think an RPC like this is valuable for people who want additional information for fee estimation, but using it automatically without at least someone looking at the result seems dangerous to me. E.g. future softforks your node doesn't know about could permit an attacker to stuff your mempool with high-fee unminable transactions.

@prusnak
Copy link
Contributor

prusnak commented Mar 22, 2023

@sipa By not providing sort-of accurate estimations from the current mempool, we force people to rely on third-party estimators, which is arguably much worse than having an independent estimation which works most of the time, although it may be games under certain specific conditions. I think that "perfect is enemy of good" rule applies here perfectly.

@nopara73
Copy link

@sipa sorry for the late reply, at first I was like "yes, that makes perfect sense," however an argument came to my mind - this morning during shower :) - in favor of it, so I'd like to share that.

Can't we assume miners are seeking the highest fee transactions at all times? In other words: maximize their income from fees. If we could, then adjusting estimatesmartfee results based on the possibility of block acceptance of the current mempool state of the node does not seem gameable to me, because by extension we could assume that the highest fee transactions we have are precisely the ones that the miners seek.

Now I may argue against this with "it is not guaranteed that the miners have the highest fee txs that my node does as well."

To this, first I'd say: we can guarantee they want to have them and isn't that enough?
Then I'd turn this around and look at it from an attacker's point of view: the way to game this would be to create transactions those have high fees, get to the targeted node, but don't get to the miners. And now I'd point out the attacker cannot guarantee his high fee transactions don't get to the miners, especially with fullrbf around. This makes the attack super expensive, since in order to make the node think the fees are very high, the attacker has to create many blocks of transactions and guarantee that only the targeted node will get to know of these transactions.

In summary, unless I am missing the point and you're referring to another kind of game/attack here that does not involve the attacker creating blocks of high fee transactions, we can be reassured the attack is too expensive to be worth trying.

@craigraw
Copy link

Using getrawmempool true is unfortunately not a solution that can be applied generally to solve this use case. On systems with low memory (4Gb or less, for example a Raspberry Pi or the Futurebit Apollo) this RPC call causes the kernel to terminate the Bitcoin Core process (OOM) when the mempool is reasonably full. This occurs consistently on all released versions AFAIK.

One workaround is to use getrawmempool false followed by repeated calls to getmempoolentry. This of course is even more inefficient, leading any reasonable client implementation to maintain a replica of mempool entries to avoid having to load them all, one by one, every time a fee rate histogram is required. This trades better performance for significantly increased memory utilisation.

It's a pity the PR has been closed since the complexity is low, there is clearly wide interest for it across many wallets, and a fee rate histogram is a useful tool in the toolbox when estimating fees. In addition, any middleware implementing the Electrum protocol needs to use one of the inefficient methods described to implement the mempool.get_fee_histogram call.

Request to all developers including maintainers to re consider their opinion.

+1.

luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jun 27, 2023
Co-authored-by: Jonas Schnelli <dev@jonasschnelli.ch>
Co-authored-by: Jon Atack <jon@atack.com>

Github-Pull: bitcoin#21422
Rebased-From: 0b87ba9
@sipa
Copy link
Member

sipa commented Jun 28, 2023

I think the discussion here is conflating two things:

  • Problems with Bitcoin Core's fee estimation algorithm.
  • The lack of a fee histogram RPC.

It's clear that the existing fee estimation algorithms make it inaccurate for many real use cases. There are good reasons for why it works the way it does, but that doesn't change the fact that the outcome is bad. We can certainly discuss what can be done about that. I've posted about one possible approach in #27995.

However, this PR is about something else; adding a feerate histogram. That seems to me like a vaguely reasonable, independent feature, but it has nothing to do with our fee estimation algorithm. And wanting a feerate histogram so you can implement fee estimation yourself just because the fee estimation feature itself does a bad job seems to be the wrong way around. If that's the goal we should just think about how to improve the algorithm.

@craigraw
Copy link

To be clear, my desire for an efficient way to retrieve fee rate histogram data has nothing to do with Bitcoin Core's fee estimation algorithm. Because fee estimation is inherently difficult and no algorithm can know the user's time preference for every transaction, Sparrow Wallet displays the fee rate histogram chart in the UI as another "tool in the toolbox". This allows a user to observe trends and adjust the fee rate to place a transaction in a particular fee rate band to suit their time preference. This is not to suggest such an approach is always superior to algorithmic fee rate estimation, merely that more data is generally useful when trying to predict the future.

@maflcko
Copy link
Member

maflcko commented Jun 29, 2023

One workaround is to use getrawmempool false followed by repeated calls to getmempoolentry. This of course is even more inefficient, leading any reasonable client implementation to maintain a replica of mempool entries to avoid having to load them all, one by one, every time a fee rate histogram is required. This trades better performance for significantly increased memory utilisation.

I think the getmempoolentry can be batched for a slight improvement. Also, you don't have to store the full entry in the application, just the txid and fee info would be enough? Finally, you'd only have to call getmempoolentry once per transaction.

Even if this pull was merged, it will just give one answer and I don't think it is clear whether this is the answer that users actually want? See #21422 (comment) and the previous discussion threads.
So anyone who wanted to give a better answer would have to implement some smarter algorithm in the application on a full copy of the mempool anyway, and this RPC wouldn't help with that.

@glozow
Copy link
Member

glozow commented Jun 29, 2023

Sparrow Wallet displays the fee rate histogram chart in the UI as another "tool in the toolbox". This allows a user to observe trends and adjust the fee rate to place a transaction in a particular fee rate band to suit their time preference.

a fee rate histogram is a useful tool in the toolbox when estimating fees

I think it's worth noting that a feerate histogram built from individual transaction feerates can be very misleading for fee estimation. A CPFP could look like n low feerate data points + 1 extremely high feerate data point. A few dozen 100kvB high-feerate transactions may dominate the block you are trying to get into, but they look identical to 100vB transactions as data points. As a user, I have no idea how I would look at a feerate histogram and get an idea of what feerate to put on my transaction.

I agree that fee estimation using the current contents of the mempool can be very useful. However, the process of taking current mempool contents and meaningfully translating them into high/med/low priority bands is quite involved. Linking one implementation which is a lot more accurate than using individual feerates, but quite computationally heavy and not necessarily appropriate for the oom'd raspis either.

Perhaps with cluster mempool we can expose a mempool entry's mining score, but that score still doesn't tell you much about how much block space is taken up by each data point.

leading any reasonable client implementation to maintain a replica of mempool entries

This is the most appropriate solution 👍. A simple map from txid to feerate would do if you're trying to keep a feerate histogram.

Also, you don't have to store the full entry in the application, just the txid and fee info would be enough? Finally, you'd only have to call getmempoolentry once per transaction.

Yeah, the nice thing about only needing fee and vsize is these will never change for a given txid in your mempool. You only need to query the entries you don't already have in your external data structure.

I think the getmempoolentry can be batched for a slight improvement.

On Core's side this idea seems the most worth pursuing, if somebody wants to open a pull.

@djkazic
Copy link

djkazic commented Jun 29, 2023

Hey, developer of matrix here.

I think a neural net fee estimator is going to be pretty performant compared to mempool space. My logger already takes into account things like CPFP and calculates the total package fee rate, and with approx 50MB of training data it's capable of estimating fees such that if the necessary rate goes down, it reacts more quickly that estimatesmartfee.

Like the built-in fee estimator, I do not use the current contents of the mempool. Rather, the training data packages up the time of tx seen, block count from seen to confirmed, and total package feerate. When it comes to inference-time, lighter nodes like RPis can download and validate a pretrained model. Hence, no fear of OOMing.

@craigraw
Copy link

The points regarding CPFP etc are well noted, although it's possible these are in practice more applicable to higher time preference fee estimations? Estimation aside, there are not insignificant client costs to maintaining a "txid to vsize and fee rate" mempool replica. By my measurement, this data structure with ~125,000 transactions requires ~14Mb of RAM.

Further, the initial load using getmempoolentry takes around 20 seconds on a locally running node, and would certainly be slower if the node was remote or running on an RPi. Because of this, I find it better to measure the time required for getrawmempool false and make a decision on whether to risk calling getrawmempool true instead of the getmempoolentry workaround. If a fee rate histogram RPC is not likely, I'd certainly be interested in a better solution to this.

@maflcko
Copy link
Member

maflcko commented Jun 29, 2023

By my measurement, this data structure with ~125,000 transactions requires ~14Mb of RAM.

If memory usage is the only concern, I am sure you could toss the transactions into (lossy) buckets and then subscribe via zmq to avoid duplicates, without having to remember txids, or individual size/fee details.

... applicable to higher time preference fee estimations?

If you only care about the top N blocks worth of transactions, the memory usage likely could also be reduced.

@santochibtc
Copy link

I don't see the use of the histogram for fee estimation. You can have a lot of transactions with minimum fee that generate a large bar dominating the histogram. I see more useful a representation like mempool.space showing the list of transactions ordered by decreasing fee and grouped by blocks. That is a list of pending blocks with associated feerate ranges.

@sipa
Copy link
Member

sipa commented Jul 6, 2023

@santochibtc "Feerate histogram" here is a bytes vs fee diagram, not a #tx vs fee diagram.

@santochibtc
Copy link

Thanks @sipa, yes, it is "a histogram of the fee rates paid by transactions in the memory pool, weighted by transaction size", but my point is still valid. If you want to include your tx in the next block you must estimate your fee based on the pending txs that are paying more per byte. I think that the histogram is not always a good picture of the mempool to estimate fees.

@nopara73
Copy link

nopara73 commented Jul 8, 2023

It's good enough.

luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Aug 16, 2023
Co-authored-by: Jonas Schnelli <dev@jonasschnelli.ch>
Co-authored-by: Jon Atack <jon@atack.com>

Github-Pull: bitcoin#21422
Rebased-From: 0b87ba9
luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Aug 29, 2023
Co-authored-by: Jonas Schnelli <dev@jonasschnelli.ch>
Co-authored-by: Jon Atack <jon@atack.com>

Github-Pull: bitcoin#21422
Rebased-From: 0b87ba9
luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jan 30, 2024
Co-authored-by: Jonas Schnelli <dev@jonasschnelli.ch>
Co-authored-by: Jon Atack <jon@atack.com>

Github-Pull: bitcoin#21422
Rebased-From: 0b87ba9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet